Font and Size Identification in Telugu Printed Document

نویسنده

K.Ram Mohan Rao

چکیده

Telugu is the official language derived from ancient Brahmi script and also one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. While a large amount of work has been developed for font and size identification of English and other languages, relatively not much work has been reported on the development of OCR system for Telugu text. Font and size identification of Indian scripts is much more complicated than other scripts because of the use of huge number of combination of characters and modifiers. Font and size identification is the pre-processing step in OCR systems. Hence Telugu font and size identification is an important area if interest in the development of Optical Character Recognition (OCR) system for Telugu script. Pre processing tasks considered here are conversion of gray scale image to binary image, image clearing, and segmentation of the text into line, separation of connected components. Zonal analysis is applied and then top zone components are identified and checked for the presence of tick mark. Aspect ratio and pixel ratio for component are calculated. Comparing these two parameters with the database we identify the font and size. Simulation studies were carried out using MATLAB with GUI for all the segmentation methods for given Telugu text and the results it observed were good enough for identification of different fonts and sizes in the given Telugu text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-font Optical Character Recognition System for Printed Telugu Text

The Telugu OCR systems available in the market currently recognize only the specific fonts of Telugu. This paper describes the development of a multi-font OCR system for printed Telugu characters using Artificial Neural Networks. In this system classification of the characters is carried out using multi layer neural network Architecture.

متن کامل

Font and Function Word Identification in Document Recognition

font would be used during recognition. This would reduce An algorithm is presented that identifies the predominant font in which the running text in an English language document the confusion caused by training on many fonts and would is printed. Frequent function words (such as the, of, and, a, effectively reduce the recognition problem to choosing the and to) are also recognized as part of th...

متن کامل

FONT DISCRIMINATIO USING FRACTAL DIMENSIONS

One of the related problems of OCR systems is discrimination of fonts in machine printed document images. This task improves performance of general OCR systems. Proposed methods in this paper are based on various fractal dimensions for font discrimination. First, some predefined fractal dimensions were combined with directional methods to enhance font differentiation. Then, a novel fractal dime...

متن کامل

Identification of Telugu, Devanagari and English Scripts Using Discriminating Features

In a multi-script multi-lingual environment, a document may contain text lines in more than one script/language forms. It is necessary to identify different script regions of the document in order to feed the document to the OCRs of individual language. With this context, this paper proposes to develop a model to identify and separate text lines of Telugu, Devanagari and English scripts from a ...

متن کامل

An Adaptive Character Recognizer for Telugu Scripts Using Multiresolution Analysis, Associative Memory

The present work is an attempt to develop a commercially viable and a robust character recognizer for Telugu texts. We aim at designing a recognizer which exploits the inherent characteristics of the Telugu Script. Our proposed method uses wavelet multiresolution analysis for the purpose extracting features and associative memory model to accomplish the recognition tasks. Our system learns the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Font and Size Identification in Telugu Printed Document

نویسنده

چکیده

منابع مشابه

Multi-font Optical Character Recognition System for Printed Telugu Text

Font and Function Word Identification in Document Recognition

FONT DISCRIMINATIO USING FRACTAL DIMENSIONS

Identification of Telugu, Devanagari and English Scripts Using Discriminating Features

An Adaptive Character Recognizer for Telugu Scripts Using Multiresolution Analysis, Associative Memory

عنوان ژورنال:

اشتراک گذاری